291 research outputs found

    Rapid Sampling for Visualizations with Ordering Guarantees

    Get PDF
    Visualizations are frequently used as a means to understand trends and gather insights from datasets, but often take a long time to generate. In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual proper- ties of interest to analysts. Our primary focus will be on sampling algorithms that preserve the visual property of ordering; our techniques will also apply to some other visual properties. For instance, our algorithms can be used to generate an approximate visualization of a bar chart very rapidly, where the comparisons between any two bars are correct. We formally show that our sampling algorithms are generally applicable and provably optimal in theory, in that they do not take more samples than necessary to generate the visualizations with ordering guarantees. They also work well in practice, correctly ordering output groups while taking orders of magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.

    Cabernet: A Content Delivery Network for Moving Vehicles

    Get PDF
    This paper describes the design, implementation, and evaluation of Cabernet, a system to deliver data to and from moving vehicles using open 802.11 (WiFi) access points encountered opportunistically during travel. Network connectivity in Cabernet is both fleeting (access points are typicallywithin range for a few seconds) and intermittent (because the access points don't provide continuous coverage), and suffers from high packet loss rates over the wireless channel. On the positive side, in the absence of losses, achievable data rates over WiFi can reach many megabits per second. Unfortunately, current protocols don't establish end-to-end connectivity fast enough, don't cope well with intermittent connectivity, and don't handle high packet loss rates well enough to achieve this potential throughput. Cabernet incorporates two new techniques to improve data delivery throughput: QuickWifi, a streamlined client-side process to establish end-to-end connectivity quickly, reducing the mean time to establish connectivity from 12.9 seconds to less than 366 ms and CTP, a transport protocol that distinguishes congestion on the wired portion of the path from losses over the wireless link to reliably and efficiently deliver data to nodes in cars. We have deployed the system on a fleet of 10 taxis, each running several hours per day in the Boston area. Our experiments show that CTP improves throughput by a factor of 2x over TCP and that QuickWifi increases the number of connectionsby a factor of 4x over unoptimized approaches. Thus, Cabernet is perhaps the first practical system capable of delivering data to moving vehicles over existing short-range WiFi radios, with a mean transfer capacity of approximately 38 megabytes/hour per car, or a mean rate of 87 kbit/s

    Lingua Manga: A Generic Large Language Model Centric System for Data Curation

    Full text link
    Data curation is a wide-ranging area which contains many critical but time-consuming data processing tasks. However, the diversity of such tasks makes it challenging to develop a general-purpose data curation system. To address this issue, we present Lingua Manga, a user-friendly and versatile system that utilizes pre-trained large language models. Lingua Manga offers automatic optimization for achieving high performance and label efficiency while facilitating flexible and rapid development. Through three example applications with distinct objectives and users of varying levels of technical proficiency, we demonstrate that Lingua Manga can effectively assist both skilled programmers and low-code or even no-code users in addressing data curation challenges.Comment: 4 pages, 6 figures, VLDB 2023 Demo pape

    Attendee-Sourcing: Exploring The Design Space of Community-Informed Conference Scheduling

    Get PDF
    Constructing a good conference schedule for a large multi-track conference needs to take into account the preferences and constraints of organizers, authors, and attendees. Creating a schedule which has fewer conflicts for authors and attendees, and thematically coherent sessions is a challenging task. Cobi introduced an alternative approach to conference scheduling by engaging the community to play an active role in the planning process. The current Cobi pipeline consists of committee-sourcing and author-sourcing to plan a conference schedule. We further explore the design space of community-sourcing by introducing attendee-sourcing -- a process that collects input from conference attendees and encodes them as preferences and constraints for creating sessions and schedule. For CHI 2014, a large multi-track conference in human-computer interaction with more than 3,000 attendees and 1,000 authors, we collected attendees' preferences by making available all the accepted papers at the conference on a paper recommendation tool we built called Confer, for a period of 45 days before announcing the conference program (sessions and schedule). We compare the preferences marked on Confer with the preferences collected from Cobi's author-sourcing approach. We show that attendee-sourcing can provide insights beyond what can be discovered by author-sourcing. For CHI 2014, the results show value in the method and attendees' participation. It produces data that provides more alternatives in scheduling and complements data collected from other methods for creating coherent sessions and reducing conflicts.Comment: HCOMP 201

    Optimizing Query Predicates with Disjunctions for Column Stores

    Full text link
    Since its inception, database research has given limited attention to optimizing predicates with disjunctions. What little past work there is has focused on optimizations for traditional row-oriented databases. A key difference in predicate evaluation for row stores and column stores is that while row stores apply predicates to one record at a time, column stores apply predicates to sets of records. Not only must the execution engine decide the order in which to apply the predicates, but it must also decide how many times each predicate should be applied and on which sets of records it should be applied to. In our work, we tackle exactly this problem. We formulate, analyze, and solve the predicate evaluation problem for column stores. Our results include proofs about various properties of the problem, and in turn, these properties have allowed us to derive the first polynomial-time (i.e., O(n log n)) algorithm ShallowFish which evaluates predicates optimally for all predicate expressions with a depth of 2 or less. We capture the exact property which makes the problem more difficult for predicate expressions of depth 3 or greater and propose an approximate algorithm DeepFish which outperforms ShallowFish in these situations. Finally, we show that both ShallowFish and DeepFish outperform the corresponding state of the art by two orders of magnitude

    Outlier Detection in Heterogeneous Datasets using Automatic Tuple Expansion

    Get PDF
    Rapidly developing areas of information technology are generating massive amounts of data. Human errors, sensor failures, and other unforeseen circumstances unfortunately tend to undermine the quality and consistency of these datasets by introducing outliers -- data points that exhibit surprising behavior when compared to the rest of the data. Characterizing, locating, and in some cases eliminating these outliers offers interesting insight about the data under scrutiny and reinforces the confidence that one may have in conclusions drawn from otherwise noisy datasets. In this paper, we describe a tuple expansion procedure which reconstructs rich information from semantically poor SQL data types such as strings, integers, and floating point numbers. We then use this procedure as the foundation of a new user-guided outlier detection framework, dBoost, which relies on inference and statistical modeling of heterogeneous data to flag suspicious fields in database tuples. We show that this novel approach achieves good classification performance, both in traditional numerical datasets and in highly non-numerical contexts such as mostly textual datasets. Our implementation is publicly available, under version 3 of the GNU General Public License

    Exercise prehabilitation in elective intra-cavity surgery: A role within the ERAS pathway? A narrative review

    Get PDF
    The Enhanced Recovery after Surgery (ERAS) model integrates several elements of perioperative care into a standardised clinical pathway for surgical patients. ERAS programmes aim to reduce the rate of complications, improve surgical recovery, and limit postoperative length of hospital stay (LOHS). One area of growing interest that is not currently included within ERAS protocols is the use of exercise prehabilitation (PREHAB) interventions. PREHAB refers to the systematic process of improving functional capacity of the patient to withstand the upcoming physiological stress of surgery. A number of recent systematic reviews have examined the role of PREHAB prior to elective intra-cavity surgery. However, the results have been conflicting and a definitive conclusion has not been obtained. Furthermore, a summary of the research area focussing exclusively on the therapeutic potential of exercise prior to intra-cavity surgery is yet to be undertaken. Clarification is required to better inform perioperative care and advance the research field. Therefore, this “review of reviews” provides a critical overview of currently available evidence on the effect of exercise PREHAB in patients undergoing i) coronary artery bypass graft surgery (CABG), ii) lung resection surgery, and iii) gastrointestinal and colorectal surgery. We discuss the findings of systematic reviews and meta-analyses and supplement these with recently published clinical trials. This article summarises the research findings and identifies pertinent gaps in the research area that warrant further investigation. Finally, studies are conceptually synthesised to discuss the feasibility of PREHAB in clinical practice and its potential role within the ERAS pathway

    WaveScript: A Case-Study in Applying a Distributed Stream-Processing Language

    Get PDF
    Applications that combine live data streams with embedded, parallel,and distributed processing are becoming more commonplace. WaveScriptis a domain-specific language that brings high-level, type-safe,garbage-collected programming to these domains. This is made possibleby three primary implementation techniques. First, we employ a novelevaluation strategy that uses a combination of interpretation andreification to partially evaluate programs into stream dataflowgraphs. Second, we use profile-driven compilation to enable manyoptimizations that are normally only available in the synchronous(rather than asynchronous) dataflow domain. Finally, we incorporatean extensible system for rewrite rules to capture algebraic propertiesin specific domains (such as signal processing).We have used our language to build and deploy a sensor-network for theacoustic localization of wild animals, in particular, theYellow-Bellied marmot. We evaluate WaveScript's performance on thisapplication, showing that it yields good performance on both embeddedand desktop-class machines, including distributed execution andsubstantial parallel speedups. Our language allowed us to implementthe application rapidly, while outperforming a previous Cimplementation by over 35%, using fewer than half the lines of code.We evaluate the contribution of our optimizations to this success

    Machine-Assisted Map Editing

    Full text link
    Mapping road networks today is labor-intensive. As a result, road maps have poor coverage outside urban centers in many countries. Systems to automatically infer road network graphs from aerial imagery and GPS trajectories have been proposed to improve coverage of road maps. However, because of high error rates, these systems have not been adopted by mapping communities. We propose machine-assisted map editing, where automatic map inference is integrated into existing, human-centric map editing workflows. To realize this, we build Machine-Assisted iD (MAiD), where we extend the web-based OpenStreetMap editor, iD, with machine-assistance functionality. We complement MAiD with a novel approach for inferring road topology from aerial imagery that combines the speed of prior segmentation approaches with the accuracy of prior iterative graph construction methods. We design MAiD to tackle the addition of major, arterial roads in regions where existing maps have poor coverage, and the incremental improvement of coverage in regions where major roads are already mapped. We conduct two user studies and find that, when participants are given a fixed time to map roads, they are able to add as much as 3.5x more roads with MAiD
    • …
    corecore